Search CORE

30 research outputs found

Decision Tree-based Syntactic Language Modeling

Author: Filimonov Denis
Publication venue
Publication date: 01/01/2011
Field of study

Statistical Language Modeling is an integral part of many natural language processing applications, such as Automatic Speech Recognition (ASR) and Machine Translation. N-gram language models dominate the field, despite having an extremely shallow view of language---a Markov chain of words. In this thesis, we develop and evaluate a joint language model that incorporates syntactic and lexical information in a effort to ``put language back into language modeling.'' Our main goal is to demonstrate that such a model is not only effective but can be made scalable and tractable. We utilize decision trees to tackle the problem of sparse parameter estimation which is exacerbated by the use of syntactic information jointly with word context. While decision trees have been previously applied to language modeling, there has been little analysis of factors affecting decision tree induction and probability estimation for language modeling. In this thesis, we analyze several aspects that affect decision tree-based language modeling, with an emphasis on syntactic language modeling. We then propose improvements to the decision tree induction algorithm based on our analysis, as well as the methods for constructing forest models---models consisting of multiple decision trees. Finally, we evaluate the impact of our syntactic language model on large scale Speech Recognition and Machine Translation tasks. In this thesis, we also address a number of engineering problems associated with the joint syntactic language model in order to make it tractable. Particularly, we propose a novel decoding algorithm that exploits the decision tree structure to eliminate unnecessary computation. We also propose and evaluate an approximation of our syntactic model by word n-grams---the approximation that makes it possible to incorporate our model directly into the CDEC Machine Translation decoder rather than using the model for rescoring hypotheses produced using an n-gram model

CiteSeerX

Digital Repository at the University of Maryland

ProQuest OAI Repository

Streaming Speech-to-Confusion Network Speech Recognition

Author: Filimonov Denis
Gandhe Ankur
Pandey Prabhat
Rastrow Ariya
Stolcke Andreas
Publication venue
Publication date: 02/06/2023
Field of study

In interactive automatic speech recognition (ASR) systems, low-latency requirements limit the amount of search space that can be explored during decoding, particularly in end-to-end neural ASR. In this paper, we present a novel streaming ASR architecture that outputs a confusion network while maintaining limited latency, as needed for interactive applications. We show that 1-best results of our model are on par with a comparable RNN-T system, while the richer hypothesis set allows second-pass rescoring to achieve 10-20\% lower word error rate on the LibriSpeech task. We also show that our model outperforms a strong RNN-T baseline on a far-field voice assistant task.Comment: Submitted to Interspeech 202

arXiv.org e-Print Archive

PROCTER: PROnunciation-aware ConTextual adaptER for personalized speech recognition in neural transducers

Author: Bulyko Ivan
Filimonov Denis
Gandhe Ankur
Liu Jing
Luo Qi
Pandey Rahul
Rastrow Ariya
Ren Roger
Stolcke Andreas
Strimel Grant
Publication venue
Publication date: 29/03/2023
Field of study

End-to-End (E2E) automatic speech recognition (ASR) systems used in voice assistants often have difficulties recognizing infrequent words personalized to the user, such as names and places. Rare words often have non-trivial pronunciations, and in such cases, human knowledge in the form of a pronunciation lexicon can be useful. We propose a PROnunCiation-aware conTextual adaptER (PROCTER) that dynamically injects lexicon knowledge into an RNN-T model by adding a phonemic embedding along with a textual embedding. The experimental results show that the proposed PROCTER architecture outperforms the baseline RNN-T model by improving the word error rate (WER) by 44% and 57% when measured on personalized entities and personalized rare entities, respectively, while increasing the model size (number of trainable parameters) by only 1%. Furthermore, when evaluated in a zero-shot setting to recognize personalized device names, we observe 7% WER improvement with PROCTER, as compared to only 1% WER improvement with text-only contextual attentionComment: To appear in Proc. IEEE ICASS

arXiv.org e-Print Archive

Low-rank Adaptation of Large Language Model Rescoring for Parameter-Efficient Speech Recognition

Author: Bulyko Ivan
Chen I-Fan
Dinh Tuan
Filimonov Denis
Gandhe Ankur
Ghosh Shalini
Gourav Aditya
Gu Yile
Kolehmainen Jari
Liu Yi-Chieh
Luo Qi
Rastow Ariya
Ren Roger
Ryu Sungho
Shivakumar Prashanth G.
Stolcke Andreas
Yang Chao-Han Huck
Yu Yu
Publication venue
Publication date: 10/10/2023
Field of study

We propose a neural language modeling system based on low-rank adaptation (LoRA) for speech recognition output rescoring. Although pretrained language models (LMs) like BERT have shown superior performance in second-pass rescoring, the high computational cost of scaling up the pretraining stage and adapting the pretrained models to specific domains limit their practical use in rescoring. Here we present a method based on low-rank decomposition to train a rescoring BERT model and adapt it to new domains using only a fraction (0.08%) of the pretrained parameters. These inserted matrices are optimized through a discriminative training objective along with a correlation-based regularization loss. The proposed low-rank adaptation Rescore-BERT (LoRB) architecture is evaluated on LibriSpeech and internal datasets with decreased training times by factors between 5.4 and 3.6.Comment: Accepted to IEEE ASRU 2023. Internal Review Approved. Revised 2nd version with Andreas and Huck. The first version is in Sep 29th. 8 page

arXiv.org e-Print Archive

Thermal Denaturation and Aggregation of Myosin Subfragment 1 Isoforms with Different Essential Light Chains

Author: Abillon
Andreev
Borejdo
Boris I. Kurganov
Chalovich
Denis I. Markov
Dmitrii I. Levitsky
Eugene O. Zubov
Filimonov
Frank
Hayashibara
Khanova
Kremneva
Kurganov
Kuznetsova
Laemmli
Levitsky
Levitsky
Levitsky
Levitsky
Lopez Mayorga
Lowey
Markossian
Markossian
Markov
Markov
Mrakovcic-Zenic
Nikolaeva
Olga P. Nikolaeva
Panusz
Permyakov
Pliszka
Rayment
Rayment
Shakirova
Staiano
Sutoh
Tong
Trayer
Trayer
Turoverov
Turoverov
Uyeda
Wagner
Weeds
Publication venue: Molecular Diversity Preservation International (MDPI)
Publication date: 01/10/2010
Field of study

We compared thermally induced denaturation and aggregation of two isoforms of the isolated myosin head (myosin subfragment 1, S1) containing different “essential” (or “alkali”) light chains, A1 or A2. We applied differential scanning calorimetry (DSC) to investigate the domain structure of these two S1 isoforms. For this purpose, a special calorimetric approach was developed to analyze the DSC profiles of irreversibly denaturing multidomain proteins. Using this approach, we revealed two calorimetric domains in the S1 molecule, the more thermostable domain denaturing in two steps. Comparing the DSC data with temperature dependences of intrinsic fluorescence parameters and S1 ATPase inactivation, we have identified these two calorimetric domains as motor domain and regulatory domain of the myosin head, the motor domain being more thermostable. Some difference between the two S1 isoforms was only revealed by DSC in thermal denaturation of the regulatory domain. We also applied dynamic light scattering (DLS) to analyze the aggregation of S1 isoforms induced by their thermal denaturation. We have found no appreciable difference between these S1 isoforms in their aggregation properties under ionic strength conditions close to those in the muscle fiber (in the presence of 100 mM KCl). Under these conditions kinetics of this process was independent of protein concentration, and the aggregation rate was limited by irreversible denaturation of the S1 motor domain

Multidisciplinary Digital Publishing Institute

Crossref

Directory of Open Access Journals

PubMed Central

Recovery of Empty Nodes in Parse Structures

Author: Denis Filimonov
Publication venue
Publication date
Field of study

In this paper, we describe a new algorithm for recovering WH-trace empty nodes. Our approach combines a set of hand-written patterns together with a probabilistic model. Because the patterns heavily utilize regular expressions, the pertinent tree structures are covered using a limited number of patterns. The probabilistic model is essentially a probabilistic context-free grammar (PCFG) approach with the patterns acting as the terminals in production rules. We evaluate the algorithm’s performance on gold trees and parser output using three different metrics. Our method compares favorably with state-of-the-art algorithms that recover WH-traces.

CiteSeerX